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This article describes a method to turn astronomical imaging into a random number gener- 
ator by using the positions of incident cosmic rays and hot pixels to generate bit streams. 
We subject the resultant bit streams to a battery of standard benchmark statistical tests for 
randomness and show that these bit streams are statistically the same as a perfect random 
bit stream. Strategies for improving and building upon this method are outlined. 
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1 Introduction 

Random numbers are of importance to many sub-fields of science. In observational astrophysics 
they are required for diverse uses (Meurers 1969) such as testing for sub-structure in galaxy clusters 
(Dressier & Shectman 1988) and Monte Carlo background correction techniques (Pimbblet et al. 2002). 
In cryptography, the generation of secure passwords and cryptographic keys is paramount to communi- 
cation being immune from eavesdropping. They are also used in selecting winning numbers for lotteries 



including the selection of Premium Bonds in the United Kingdom (http://www.nsandi.eom/products/pb/l 



Large Monte Carlo computations, however, remain the primary driver of intensive searches for truly 
random number generators (e.g. Ferrenberg, Landau, & Wong 1992; James 1990). 

For many purposes we would essentially like to have a long bit stream consisting of I's and O's. 
Each bit in the stream should be independently generated with equal probability of being a 1 or 0. 
Therefore as the length of the stream, n, tends toward infinity, the expectation value of any individual 
bit being either 1 or is 1/2. The traditional method of obtaining such a stream is to use a pseudo- 
random number generator (PRNG; e.g. Press et al. 1992). PRNG's typically rely upon the input 
of a 'seed' quantity which is then processed using numerical and logical operations to give a stream 
of random bits. Whilst such PRNG's are probably sufficient for most (minor) types of applications, 
they are clearly predictable if the initial seed is known. This makes PRNG's highly inappropriate for 
Monte Carlo-like calculations (Gonzalez & Pino 1999). 

A truly random number generator (RNG) should possess qualities that make the bits unpredictable. 
The obvious sources of RNG's are those that possess large amounts of entropy or chaos (Vavriv 2003; 
Gleeson 2002; Gonzalez & Pino 1999). Examples include radioactively decaying sources (e.g. HotBits; 
http://www.fourmilab.ch/hotbits/), electrical noise from a semiconductor diode, and thermal noise. 

An overlooked and potentially large source of random numbers is to be found in astronomical 
imaging. Imaging at a telescope will inevitably produce unwanted cosmetic features such as cosmic 
ray events, satellite trails and seeing effects; blurring due to the movement of the atmosphere (in the 
case of ground-based telescopes). It is precisely these features (and in particular cosmic rays) which 
potentially make astronomical imaging a good RNG. 

This article presents an assessment of astronomical imaging data as a source for a RNG. In Sec- 
tion 2, we demonstrate how it is possible to generate a stream of random bits from a single astronomical 
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Figure 1: Overview of the processes required to generate a random bit stream from an initial astro- 
nomical image. On a Dell Precision workstation 530 machine, analyzing an initial image of 2 x 4 k 
pixels, the generation of an initial bit stream takes about 50 seconds whilst the de-skewing requires 
about 30 seconds (for cither method). The average rate of random bit production is approximately 
2500 bits per second. Depending upon the amount of discarded bits, this figure can range from as low 
as 1000 bits per second up to 4000 bits per second. 



image. Examples of such bit streams are examined in Section 3 using a battery of statistical tests to 
evaluate their randomness. Our findings are summarized in Section 4. 

2 Generating Random Bits 

Assuming that one is in possession of a sample of astronomical images that possess cosmic ray events 
we can proceed to obtain a bit stream from them by following the procedure outlined in Figure ^ We 
detail the individual steps below. 

Our aim is to detect the locations of any cosmic ray or 'hot spot' (pixel values that are significantly 
greater than their local neighbours) in the pixel distribution. For this experiment, we use single-shot 
exposures of 300 to 600 seconds from non-overlapping wide-field observations consisting of 2 x 4 k 
pixels from Pimbblet and Drinkwater (2004) and their on-going follow-up observations"'^. Firstly, 
we use the IRAF (http://iraf.noao.edu/) task COSMICRAYS with default parameters to remove the 
cosmic rays from the original image. Then, using imarith, we subtract off the cosmic ray free imaging 
from the original to create a difference image in which there should be only cosmic rays (Figure [^J. 
Inevitably, this technique will identify not only true cosmic rays but also anomalously hot pixels from 
the distribution. To turn the difference image into a bit stream, we sequentially examine the contents 
of each pixel in turn, row by row, column by column. Pixels with a value of zero in the difference 
image translate into a for the bit stream whilst those with values greater than zero (the hot pixels) 
become 1. 

The fraction of pixels identified as cosmic rays (and hot pixels) using this method is typically 2-3 
per cent for our exposures. Clearly, there exist more O's in the bit stream than there are I's. Moreover, 
there are distinct 'holes' in the hot pixel distribution of the difference image where legitimate objects 

^It is also unnecessary to pre-process these images with flat-fields, for example, as all we are interested in are the 
locations of hot pixels. Indeed, our testing has shown that a raw image produces just an equally random bit stream as 
a post-processed one does. One problem that is encountered is the presence of bad pixels, which always occur in the 
same place on a CCD. These should be removed with the FIXPIX (or similar) task before proceeding. 
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Table 1: Scheme for the von Neumann (1963) de-skewing method. The original bit stream from the 
imaging is read in as a sequence of non-overlapping pairs ('Input Pair' column). The output for the 
new bit stream is then given in the 'Output' column. Where 'Null' is indicated, nothing is appended 
to the new bit stream. 
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Table 2: A comparison of how the de-skewed bit stream is generated using the von Neumann (1963) 
and deliminator methods. 
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occurred in the original image (the galaxy in Figure 12) • So, whilst there are random events in our bit 
stream, it is highly skewed toward O's. 

2.1 De-skewing 

To turn our bit stream into a uniformly random distribution, it is necessary to de-skew it (an 'entropy 
distillation process'; Rukhin et al. 2001). Here we adopt and investigate two common methods of 
de-skewing. The first is that of von Neumann (1963). We read the bit stream generated from the 
imaging as a sequence of non-overlapping pairs. The pairs are then transformed into a new bit stream 
according to the scheme presented in Tabled This scheme removes all biases in the original bit stream 
at the expense of drastically reducing the overall size of the original as it removes the long sequences 
of O's associated with legitimate objects (Table 01. The typical reduction for our imaging is in the 
range 85-98 per cent, although this is a highly variable parameter. 

The second method is to use the hot pixels (or groups of hot pixels; I's) as deliminators between 
long streams of O's. The length of non-overlapping pairs of long streams of O's are then compared to 
each other to generate a 1 or a depending if the first stream is longer than the second or vice- versa. 
If the lengths are equal, nothing is appended to the new bit stream. An example of how both of these 
methods work is illustrated in Table El The clear disadvantage of the delimination method is that a 
much smaller bit stream is produced than for the von Neumann method. 

3 Evaluating the Randomness 

In truly random bit stream, each bit should be generated with probability 1/2 of producing either a 
or 1. Further, each bit should be generated independently of any other bit in the bit stream. One 
should not, therefore, be able to predict the value of a given bit by examining the values of the bits 
generated prior to it in the bit stream. These conditions define an ideal, truly random bit stream and 
we will use them to test our random bits against. 

To evaluate the randomness of our bit stream, we subject it to a battery of benchmark statis- 
tical tests. The tests we use are a selection of those devised by Random Number Generation and 
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Figure 2: Example of the image processing method. Top: a sub-section of the original image measuring 
613 X 480 pixels. Middle: the cosmic ray rejected version of the image. Bottom: the difference image. 
Note how the obvious cosmic ray (left of centre) is rejected along with a host of other relatively 
'hot' pixels. Real objects, meanwhile, leave an obvious hole in the difference image which requires 
de-skewing to generate a random bit stream. 
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Table 3: Possible configuration of the conclusions of any of the given statistical hypothesis tests 
(Rukhin et al. 2001). Hq is the hypothesis that the bit stream is random. 

True Result 

Situation Accept Hq Reject Hq 

Hq true Correct Type I error 

Hi true Type II error Correct 



Testing collaboration of the National Institute of Standards and Technology (NIST; Maryland, USA; 
http://csrc.nist.gov/rng/index.html). The NIST statistical test suite source code is freely available to 
download from their web site. 

For each test, the software formulates a specific null hypothesis (Hq) and alternative hypothesis 
(Hi). We will specify that Hq is the hypothesis that our bit stream is random and that Hi is the 
hypothesis that it is non-random. To accept or reject Hq, one determines a test statistic and compares 
it to a critical value chosen to be in the tails of a theoretical reference distribution of the test statistic. 
The possible outcomes of the statistical testing are illustrated in Table|31 The probability of obtaining 
a Type I error (Table 01 is therefore the level of significance of a test (see Rukhin et al. 2001), which 
we set at a level of 0.01 for this work. 

The software determines a P-value for each test: the probability that a perfect RNG would 
produce a bit stream that is less random than the bit stream that we test (i.e. a P-value of zero 
denotes a bit stream that is certainly non-random). Therefore to reject the null hypothesis, Hq, (and 
hence fail the test) at a 99 per cent confidence level would require: P-value < 0.01. Clearly, the 
P-value only assesses the relative incidence of Type I errors (Table |3} . What it does not describe is 
the probability that a non-random number generator could produce a sequence of numbers at least as 
random as our bit stream that is being tested (a Type II error; Table |21 Rukhin et al. 2001). 

Here, we briefly describe each test (the tests and the statistics behind them are described in much 
more detail in Rukhin et al. 2001) and summarize the results in Table 0] 

If our bit stream is random, then the number of I's and O's overall and in any part (i.e. any sub- 
sequence) of it should be approximately the same. Therefore the first test is to examine the frequency 
of I's and O's in the bit stream. In the second test, these frequencies are re-computed for sub-sequence 
blocks of length M (see also Knuth 1981; Pitman 1993). 

Next, we test the number of runs in the sequence, where a run is defined as an uninterrupted 
sequence of identical bits. This tests if the bit stream oscillates with sufhcient celerity between I's 
and O's. We follow this test by a similar one that evaluates the longest run of I's in the sequence to 
determine if it is the same as would be expected for a random distribution. If there is an irregularity 
in the longest run length of I's, then this will also be reflected in the longest run length of O's; hence 
we only test the longest run length of I's. 

To test for any linear dependence of sub-strings of fixed length within the original sequence, the 
rank of disjoint sub-matrices is examined. This method is described in more detail by Marsaglia in 
the DIEHARD statistical tests (http : / /st at . fsu . edu / ~geo /diehard . html) . 

We can also consider the bit stream as a random walk and hence test the maximal excursion from 
zero for the cumulative sum (cusum) of adjusted digits (+1, —1) in our bit stream or sub-sequence 
therein (Revesz 1990). For a random sequence, this cusum should be near zero. Finally, by performing 
a discrete fourier transform (DFT), we can look for periodic features in our bit stream that would 
indicate a lack of randomness. 

Table0]shows that both variants of the de-skewing method are sufficient to pass all of the standard 
tests outlined above by at least the minimum pass rate (Rukhin et al. 2001). We can further assess 
the validity of our conclusion by examining the distribution of P-values, which for a random sequence 
should be ^ uniform. For this, we re-run our experiment, but use 1000 bit streams of length 100,000 
(since 100 bit streams comprise a relatively small sample). All the tests outlined above (Table 0)) 
are passed once again and we display the distribution of P-values for these in Figure |3| All of the 
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Table 4: The proportion of 100 bit streams of length n — 1,000,000 that passed each statistical test 
(critical P-value — 0.01). The minimum pass rate in order for our sequence to be considered random 
is approximately 0.96 for each statistical test (see Rukhin et al. 2001). 



Test 


Proportion 
von Neumann 


Passed Notes 
deliminator 


Frequency 


0.98 


1.00 


Block Frequency 


0.98 


1.00 M = 1000 


Runs 


0.97 


0.99 


Longest Run 


0.99 


0.97 


Rank 


1.00 


0.99 


Cusum 


0.97 


0.99 


DFT 


1.00 


1.00 



distributions of P-values approximate uniformity very well. To test the uniformity of the distribu- 
tions, we can use a test and a determination of a P-value that corresponds to the goodness-of-fit 
distributional test of the P-values (a so-called 'P-value of P-values^; Rukhin ct al. 2001). The 
statistic is simply: 

.^^(^WOO)! (1) 

^ 100 ^ ' 

where Fi is the number of P-values in bin i of Figure O The P-value of P-values is then the 
complemented incomplete gamma function: 

l-r(a,z) (2) 

where we set a — 9/2 and z — jl (see Rukhin et al. 2001). This yields a mean value for the 
distributions in FigureElof 9.9 ± 2.8 whilst 1 - r(9/2,xV2) = 0.357. Since 1 - r(9/2,xV2) is much 
larger than (say) 0.0001, we can consider the distributions to be uniformly spread. 

We note, however, that by altering the size of the bit stream downward (say n = 10, 000), we have 
been able to cause the von Neumann de-skewed variant to fail the runs test. This emphasizes the fact 
that these tests need to be carried out on a large bit stream sample (at least n > 100, 000). 



4 Summary 

We have described how astronomical imaging can be used as a true RNG by application of simple 
cosmic ray rejection algorithms. Although we throw away a large fraction of our original data through 
the de-skewing methods, we have shown that resultant bit stream is sufficiently random to pass modern 
tests for randomness. 

The tests that we applied are only a selection of the NIST statistical test suite. There are more 
within this suite and certainly more beyond (e.g. Tu & Fischbach 2003; Ballesteros & Martfn-Mayor 
1998; Knuth 1981). We have therefore looked at applying more complex tests for randomness, as 
detailed in the NIST test suite (e.g. non-overlapping template matching, etc.). We find that these 
additional tests are readily passed by both de-skewed variants of our bit streams. 

Several improvements to our methodology can potentially be made. We are looking at different 
de-skewing techniques to improve the test statistics. For example, the von Neumann method can be 
used twice (or more) on the bit stream generated from the image. The resulting proportions of bit 
streams that pass the statistical tests in Table 0] increases fractionally as a result, but not significantly. 
Our next step is to attempt to create a web interface where it will be possible to download random 
numbers in real time using this method. This could be accomplished by using the international 
network of continuous cameras (concams; Nemiroff & Rafert 1999). The concams have the virtues 
that one does not require the sky to be dark locally and the images are freely available to the public. 
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Figure 3: Histograms of the P-value distributions arising from applying the seven statistical tests from 
Table 01 to 1000 bit streams of length 100,000. All of the distributions are approximately uniform. 
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The imaging used in this work is at optical wavelengths (specifically B, V, R and /-bands). It 
may be interesting to examine how the test results varied with other parts of the electromagnetic 
spectrum, if at all. The imaging is also non-overlapping. If the concams constitute a valid RNG, 
then it is worthwhile to confirm that images of the same area of sky produce independent random bit 
streams, which should be the case as we are only considering the incidence of hot pixels (i.e. cosmic 
rays). 

Accessory Materials 

One sequence of 1,000,000 bits (von Neumann de-skewed; approximately 1.1 Mb in size) used in this 
work will be presented in the accessory materials available from PASA online. Please note that this 
bit stream is only one small part of the much larger sample used to generate the results presented in 
this work. 
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